Search CORE

38 research outputs found

Pairwise Quantization

Author: Arandjelović Relja
Babenko Artem
Lempitsky Victor
Publication venue
Publication date: 05/06/2016
Field of study

We consider the task of lossy compression of high-dimensional vectors through quantization. We propose the approach that learns quantization parameters by minimizing the distortion of scalar products and squared distances between pairs of points. This is in contrast to previous works that obtain these parameters through the minimization of the reconstruction error of individual points. The proposed approach proceeds by finding a linear transformation of the data that effectively reduces the minimization of the pairwise distortions to the minimization of individual reconstruction errors. After such transformation, any of the previously-proposed quantization approaches can be used. Despite the simplicity of this transformation, the experiments demonstrate that it achieves considerable reduction of the pairwise distortions compared to applying quantization directly to the untransformed data

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Revisiting Pretraining Objectives for Tabular Deep Learning

Author: Alekberov Artem
Babenko Artem
Gorishniy Yury
Rubachev Ivan
Publication venue
Publication date: 12/07/2022
Field of study

Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP. For tabular problems, several pretraining methods were proposed, but it is not entirely clear if pretraining provides consistent noticeable improvements and what method should be used, since the methods are often not compared to each other or comparison is limited to the simplest MLP architectures. In this work, we aim to identify the best practices to pretrain tabular DL models that can be universally applied to different datasets and architectures. Among our findings, we show that using the object target labels during the pretraining stage is beneficial for the downstream performance and advocate several target-aware pretraining objectives. Overall, our experiments demonstrate that properly performed pretraining significantly increases the performance of tabular DL models, which often leads to their superiority over GBDTs.Comment: Code: https://github.com/puhsu/tabular-dl-pretrain-objective

arXiv.org e-Print Archive